A new synthesis model for an allophone based text-to-speech system

نویسندگان

  • Lou Boves
  • Joop Kerkhoff
  • H. Loman
چکیده

Although electronic speech synthesis by now has a tradition of several decades, there is still no agreement on the most preferable structure for a speech synthesizer. In this paper we will compare several structures that have been used by workers in the field. As these all appear to have some drawbacks, we will propose an alternative structure that should solve at least some of the problems. The single most important axiom underlying our work is the opinion that the development of synthesis rules will be made much easier and less time consuming if optimal use can be made of existing phonetic knowledge. This knowledge happens to be formulated either in terms of articulatory postures and movements or in terms of formant pattems. Taking recourse to the acoustic theory of speech production [1,2] it is not too difficult to translate articulatory data into formant pattems. The transformation of formant pattems into articulatory configurations is more difficult; also, the result is not necessarily unique [3 ]. This is one reason why articulatory synthesis has received much less attention than formant synthesis or terminal analog synthesis. In this paper the discussion will be restricted to terminal analog synthesis, and more specifically, to formant synthesis. The use of linear prediction parameters like reflexion coefficients or Log Area Ratios is not considered because, regardless of their sugestive names, the relation of these parameters to actual vocal tract configurations is, at best, disputable.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Allophone Synthesis Using A Neural Network

Most people reading this paper will be aware of the NETalk system of Sejnowski and Rosenberg [1], in which a multi-layer perceptron was trained to select the correct allophone for combinations of letters occurring in plain English text. Once suitable allophones have been selected, the problem remains of how should the sounds corresponding to a sequence of allophones be produced? The most straig...

متن کامل

From diphones to allophones: from data to rules

A research project is presented in which we aim to design a speech synthesis model based on both the diphone and the allophone concepts, i.e. the data-driven and rule-driven approach for speech synthesis, respectively. At present, diphone concatenation for Dutch Ieads to more intelligible speech than when a rule-based allophone synthesis is applied, although the latter synthesis has the theoret...

متن کامل

Synthesizing and evaluating an artificial language: klingon

The synthesis of an artificial language can provide some interesting extensions for the evaluation of text-to-speech (TTS) systems. For the alternative evaluation of the TTS system DRESS a new module for the artificial language Klingon has been developed. The linguistic and phonetic structure of Klingon can be modeled mainly by rules, with less exceptions. This contribution introduces the multi...

متن کامل

An Unit Selection based Hindi Text To Speech Synthesis System Using Syllable as a Basic Unit

Concatenative speech synthesis using phoneme, di-phone and allophone as an elementary unit for Hindi speech synthesis requires significant quality improvement. The naturalness of the state of the art waveform synthesizer is attributed due to the use of syllable as a basic unit. The primary reason for choosing the syllable as a basic unit is that the Indian languages are syllable centered. This ...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1987